An Empirical Comparison of Exact Nearest Neighbour Algorithms
نویسندگان
چکیده
Nearest neighbour search (NNS) is an old problem that is of practical importance in a number of fields. It involves finding, for a given point q, called the query, one or more points from a given set of points that are nearest to the query q. Since the initial inception of the problem a great number of algorithms and techniques have been proposed for its solution. However, it remains the case that many of the proposed algorithms have not been compared against each other on a wide variety of datasets. This research attempts to fill this gap to some extent by presenting a detailed empirical comparison of three prominent data structures for exact NNS: KD-Trees, Metric Trees, and Cover Trees. Our results suggest that there is generally little gain in using Metric Trees or Cover Trees instead of KD-Trees for the standard NNS problem.
منابع مشابه
Scaling up the Accuracy of K -nearest-neighbour Classifiers: a Naive-bayes Hybrid
k-nearest-neighbour (KNN) has been widely used as an effective classification model. In this paper, we summarize three main shortcomings confronting KNN and then single out three categories of approaches for overcoming its three main shortcomings. After reviewing some algorithms in each category, we presented a hybrid algorithm called dynamic k-nearest-neighbour naive Bayes with attribute weigh...
متن کاملLimit theory for the random on-line nearest-neighbor graph
In the on-line nearest-neighbour graph (ONG), each point after the first in a sequence of points in R is joined by an edge to its nearest-neighbour amongst those points that precede it in the sequence. We study the large-sample asymptotic behaviour of the total power-weighted length of the ONG on uniform random points in (0, 1)d. In particular, for d = 1 and weight exponent α > 1/2, the limitin...
متن کاملPersonal Credit Score Prediction using Data Mining Algorithms (Case Study: Bank Customers)
Knowledge and information extraction from data is an age-old concept in scientific studies. In industrial decision-making processes, the application of this concept gives rise to data-mining opportunities. Personal credit scoring is an ever-vital tool for banking systems in order to manage and minimize the inherent risks of the financial sector, thus, the design and improvement of credit scorin...
متن کاملA comparison of k-nearest neighbour algorithms with performance results on speech data
The (k-)nearest neighbour problem is well known in a wide range of areas. Many algorithms to tackle this problem suffer from the “curse of dimensionality” which means that the execution time grows exponentially with increasing dimension. Therefore, it is important to have efficient algorithms for the problem. In this report, some well known tree-based algorithms for the k-nearest neighbour are ...
متن کاملOptimal weighted nearest neighbour classifiers
We derive an asymptotic expansion for the excess risk (regret) of a weighted nearest-neighbour classifier. This allows us to find the asymptotically optimal vector of nonnegative weights, which has a rather simple form. We show that the ratio of the regret of this classifier to that of an unweighted k-nearest neighbour classifier depends asymptotically only on the dimension d of the feature vec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007